Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rabit harden] remove rabit_num_trail from dmlc-core #512

Merged
merged 8 commits into from
Mar 14, 2019
Merged

[rabit harden] remove rabit_num_trail from dmlc-core #512

merged 8 commits into from
Mar 14, 2019

Conversation

chenqin
Copy link
Contributor

@chenqin chenqin commented Mar 8, 2019

Follow up refactor based on comments after #510.

  • Introduce --local-num-attempt to config total number of times local tracker could restart workers.(for obvious reasons, this config serves little meaning in distributed environment)
  • Pass updated DMLC_NUM_ATTEMPT value to mocked rabit worker as number of failed tails local tracker executed in current job.

This pr served a prerequisite of reenable all rabit C++ tests (following pr here dmlc/rabit#81)

@chenqin
Copy link
Contributor Author

chenqin commented Mar 11, 2019

@CodingCat @hcho3

@chenqin
Copy link
Contributor Author

chenqin commented Mar 12, 2019

passed xgb integration tests
https://travis-ci.org/chenqin/xgboost/builds/505366624

tracker/dmlc_tracker/local.py Outdated Show resolved Hide resolved
@chenqin
Copy link
Contributor Author

chenqin commented Mar 14, 2019

^ @szha can we merge this to master dmlc/rabit#82 is point to this fix.

@szha szha merged commit e7d2014 into dmlc:master Mar 14, 2019
@chenqin
Copy link
Contributor Author

chenqin commented Mar 14, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants